Numpy Guide

Manish Patel

Feb 9, 2024

Introduction to NumPy

What is NumPy?

  • NumPy is a Python package which stands for ‘Numerical Python’.
  • It is the core library for scientific computing, which contains a powerful n-dimensional array object.
  • Provide tools for integrating C, C++ etc.
  • It is also useful in linear algebra, random number capability etc.
  • Sophisticated (broadcasting) functions
  • Tools for integrating C/C++ and Fortran code
  • Useful linear algebra, Fourier transform, and random number capabilities
  • NumPy array can also be used as an efficient multi-dimensional container for generic data.

Keypoints

  • NumPy arrays have a fixed size at creation, unlike Python lists(which can grow dynamically).
  • Changing the size of an ndarray will create a new array and delete the original.
  • The elements in a NumPy array are all required to be of the same data type and thus will be the same size in memory(Homogenous)
  • NumPy arrays facilitate advance mathematical and other types of operations on large numbers of data(Faster)

NumPy Array

Numpy array is a powerful N-dimensional array object which is in the form of rows and columns. We can initialize numpy arrays from nested Python lists and access it elements. In order to perform these numpy operations.

N-dimensional Array

  • 1Dimensional(1D) Array
  • 2Dimensional(2D) Array
  • 3Dimensional(3D) Array

NdArray

NumPy Ecosystem

  • statsmodel Estimate statistical models, and performs tests.
  • scikit-image Collection of algorithms for image processing.
  • scikit-learn Simple and efficient tools for machine learning
  • pandas Data Analysis and manipulation.
  • matplotlib Plotting library for 2D graphs and visualizations.

Why Numpy?

  • Less Memory
  • Fast
  • Convenient

Why NumPy is Fast?

Vectorization

Vectorization describes the absence of any explicit looping, indexing etc; in the code-these things are taking place, of course, just behind the scenes in optimized, pre-compiled C code.

Broadcasting

The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations. Subject to certain constrains, the smaller array is “broadcast” across the larger array so that they have compatible shapes.

Getting Started

Use the following import convention

import numpy as np

Calculation

  • Element wise sum is not possible in Python list. But numpy can do that it is an advantage of numpy array
import numpy as np
# add 2 lists 
L1 = [1, 2, 3]
L2 = [4, 5, 6]
print(L1+L2)
[1, 2, 3, 4, 5, 6]
# element wise sum using numpy array 
import numpy as np 
A1 = np.array([1, 2, 3])
A2 = np.array([4, 5, 6])
print(A1+A2)
[5 7 9]

Less Memory

import numpy as np
import time
import sys
S = range(1000)
print("Python List: ", sys.getsizeof(5)*len(S))
 
D = np.arange(1000)
print("Numpy Array: ", D.size*D.itemsize)
Python List:  28000
Numpy Array:  4000

Faster

import time
import sys
 
SIZE = 1000000
 
L1 = range(SIZE)
L2 = range(SIZE)
A1 = np.arange(SIZE)
A2 = np.arange(SIZE)
 
start= time.time()
result=[(x,y) for x,y in zip(L1,L2)]
# time in ms 
print((time.time()-start)*1000)
 
start = time.time()
result = A1+A2
# time in ms 
print((time.time()-start)*1000)
226.06158256530762
31.348228454589844

MAGIC COMMAND

%timeit sum(range(1000))
17.8 µs ± 1.68 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
%timeit np.sum(np.arange(1000))
10.4 µs ± 110 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Creating Arrays

  • Array: Ordered collection of elements of basic data types of given length.
  • Syntax
np.array(object)
# import numpy 
import numpy as np 

1D Array

# Creating 1D array
A = np.array([1, 2, 3])
A 
array([1, 2, 3])

2D Array

# Creating 2D array
B = np.array([[1, 2, 3], [3, 4, 5]])
B 
array([[1, 2, 3],
       [3, 4, 5]])

3D Array

# Creating 3D array
C = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
C 
array([[[1, 2],
        [3, 4]],

       [[5, 6],
        [7, 8]]])

Printing Arrays

When you print an array, NumPy displays it in a similar way to nested lists, One-dimensional arrays are then printed as rows, bidimensionals as matrices and tridimensionals as lists of matrices.

# Printing arrays 
X = np.array([1, 2, 3, 4, 5])
print(X)
[1 2 3 4 5]

LARGE ARRAY

If an array is too large to be printed, NumPy automatically skips the central part of the array and only prints the corners

# Printng large array 
print(np.arange(10000))
# Set threshold
np.set_printoptions(threshold = 2**32)
# Printing large arrays 
print(np.arange(10000)) 
# type 
print(type(A))

Array with Categorical Entities

  • Numpy can handle different categorical entities.
  • All elements are coerced into same data type
# create an array with categorical entities. 
X = np.array([12, 13, "n"])
print(X)
['12' '13' 'n']
# type 
print(type(X))
<class 'numpy.ndarray'>
# Creating 2D array
A2 = np.array([[3, 4, 5], [7, 8, 9]])
print(A2) 
[[3 4 5]
 [7 8 9]]
# Creating 3D array
A3 = np.array([[(1, 2, 3), (4, 5, 6)], [(7, 8, 9), (10, 11, 12)]])
print(A3) 
[[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]

Inspecting array properties

Size

  • Returns number of elements in array
  • Syntax: array.size
A1 = np.array([1, 2, 3,4, 5])
# size 
A1.size
5

Shape

  • Returns dimensions of array (rows,columns)
  • Syntax: array.shape
A2 = np.array([[4, 5, 6], [7, 8, 9]])
# shape 
A2.shape 
(2, 3)
# get row 
A2.shape[0]
2
# get column
A2.shape[1]
3

Data Type

  • Returns type of elements in array
  • Syntax: array.dtype
A3 = np.linspace(0, 100, 6)
# dtypes 
A3.dtype
dtype('float64')

Type Conversion

  • Convert array elements to type dtype
  • Syntax: array.astype(dtype)
    • dtype - data type
A4 = np.ones((2,3))
# convert 
A4.astype(np.float16)
array([[1., 1., 1.],
       [1., 1., 1.]], dtype=float16)

Numpy array to Python List

  • Returns the Python list
  • Syntax: array.tolist()
A5 = np.linspace(0, 100, 20)
# array to list 
A5.tolist() 
[0.0,
 5.2631578947368425,
 10.526315789473685,
 15.789473684210527,
 21.05263157894737,
 26.315789473684212,
 31.578947368421055,
 36.8421052631579,
 42.10526315789474,
 47.36842105263158,
 52.631578947368425,
 57.89473684210527,
 63.15789473684211,
 68.42105263157896,
 73.6842105263158,
 78.94736842105263,
 84.21052631578948,
 89.47368421052633,
 94.73684210526316,
 100.0]

Get Help: View documentation

  • Returns a documentation
  • Syntax: np.info(np.function)
    • function - linspace, logspace, eye, ones, zeros etc.
np.info(np.linspace)

Generate arrays using zeros()

  • Returns an array of given shape and type filled with zeros
  • Syntax: np.zeros(shape, dtype)
    • shape - integer or sequence of integers
    • dtype - data type(default: float)
# 1D array of length 3 with all values 0 
Z1 = np.zeros(3)
print(Z1)
[0. 0. 0.]
# 2D array of 3x4 with all values 0 
Z2 = np.zeros((3,4))
print(Z2)
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]

Generate arrays using ones()

  • Returns an array of given shape and type filled with ones
  • Syntax: np.ones(shape, dtype)
    • shape - integer or sequence of integers
    • dtype - data type(default: float)
# 1D array of length 3 with all values 1
A1 = np.ones(3)  
print(A1) 
[1. 1. 1.]

Note - Rows = 3 - Columns = 4

# 2D array of 3x4 with all values 1
A2 = np.ones((3,4))
A2
print(A2) 
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]

Generate arrays using arange()

  • Returns equally spaced numbers with in the given range based on step size.
  • Syntax: np.arange(start, stop, step)
    • start- starts of interval range
    • stop - end of interval range ’
    • step - step size of interval
# not specify start and step 
A1 = np.arange(10)
print(A1)
[0 1 2 3 4 5 6 7 8 9]
# specifying start and step 
A2 = np.arange(start=1, stop=10, step=2)
print(A2)
[1 3 5 7 9]
# another way 
A3 = np.arange(10, 25, 2)
print(A3)
[10 12 14 16 18 20 22 24]

Generate arrays using linspace()

  • Returns equally spaced numbers within the given range based on the sample number.
  • Syntax: np.linspace(start, stop, num, dtype, retstep)
    • start-start of interval range
    • stop-end of the interval range
    • num- number of samples to be generated
    • dtype-type of output array
    • retstep-return the samples, step values
# array of evenly spaced values 0 to 2, here sample size = 9
L1 = np.linspace(0,2,9)
print(L1)
[0.   0.25 0.5  0.75 1.   1.25 1.5  1.75 2.  ]
# Array of 6 evenly divided values from 0 to 100
L2 = np.linspace(0, 100, 6)
print(L2) 
[  0.  20.  40.  60.  80. 100.]

More examples

# Array of 1 to 5
L3 = np.linspace(start=1, stop=5, endpoint=True, retstep=False)
print(L3, end = " ") 
[1.         1.08163265 1.16326531 1.24489796 1.32653061 1.40816327
 1.48979592 1.57142857 1.65306122 1.73469388 1.81632653 1.89795918
 1.97959184 2.06122449 2.14285714 2.2244898  2.30612245 2.3877551
 2.46938776 2.55102041 2.63265306 2.71428571 2.79591837 2.87755102
 2.95918367 3.04081633 3.12244898 3.20408163 3.28571429 3.36734694
 3.44897959 3.53061224 3.6122449  3.69387755 3.7755102  3.85714286
 3.93877551 4.02040816 4.10204082 4.18367347 4.26530612 4.34693878
 4.42857143 4.51020408 4.59183673 4.67346939 4.75510204 4.83673469
 4.91836735 5.        ] 
# Array of 1 to 5
L4 = np.linspace(start=1, stop=5, endpoint=True, retstep=True)
print(L4) 
(array([1.        , 1.08163265, 1.16326531, 1.24489796, 1.32653061,
       1.40816327, 1.48979592, 1.57142857, 1.65306122, 1.73469388,
       1.81632653, 1.89795918, 1.97959184, 2.06122449, 2.14285714,
       2.2244898 , 2.30612245, 2.3877551 , 2.46938776, 2.55102041,
       2.63265306, 2.71428571, 2.79591837, 2.87755102, 2.95918367,
       3.04081633, 3.12244898, 3.20408163, 3.28571429, 3.36734694,
       3.44897959, 3.53061224, 3.6122449 , 3.69387755, 3.7755102 ,
       3.85714286, 3.93877551, 4.02040816, 4.10204082, 4.18367347,
       4.26530612, 4.34693878, 4.42857143, 4.51020408, 4.59183673,
       4.67346939, 4.75510204, 4.83673469, 4.91836735, 5.        ]), 0.08163265306122448)

Specifying Endpoint - endpoint=True, include 5 - endpoint=False,exclude 5

Specifying Retstep - retstep=False, doesn’t return the step value - endpoint=False, returns the samples as well step value

Generate arrays using logspace()

  • Returns equally spaced numbers within the given range based on the log scale.
  • Syntax: np.logspace(start, stop, num, endpoint, base, dtype, retstep)
    • start- start of the sequence
    • stop- end of the sequence
    • num- number of samples to be generated(default: 50)
    • dtype- type of output array
    • retstep- return the samples, step values
    • endpoint - if true, stop is the last sample
    • base - base of the log space(default: 10.0)

Examples - logspace

# generate an array with 5 samples with base 10.0 
np.logspace(1, 10, num=5, endpoint=True)
array([1.00000000e+01, 1.77827941e+03, 3.16227766e+05, 5.62341325e+07,
       1.00000000e+10])
# generate an array with 5 samples with base 2.0
np.logspace(1, 10, num=5, endpoint=True, base=2.0)
array([   2.        ,    9.51365692,   45.254834  ,  215.2694823 ,
       1024.        ])

Generate constant arrays using full()

  • Return a new array of given shape and type, filled with fill_value.
  • Syntax: np.full(shape,fill_value, dtype)
    • shape - Shape of the new array, e.g., (2, 3) or 2.
    • fill_value - Fill value(scaler).
    • dtype - The desired data-type for the array
# generate 2x2 constant array, constant = 7
C = np.full((2, 2), 7)
print(C)
[[7 7]
 [7 7]]

Creating identity matrix using eye()

  • An array where all elements are equal to zero, except for the k-th diagonal, whose values are equal to one
  • Syntax: np.eye(N, M, k, dtype)
    • N : Number of rows(int) in the output
    • M : Number of columns in the output. If None, defaults to N.
    • k : Index of the diagonal: 0 (the default) refers to the main diagonal, a positive value refers to an upper diagonal, and a negative value to a lower diagonal
    • dtype: Data-type of the returned array.
# generate 2x2 identity matrix 
I = np.eye(2)
print(I) 
[[1. 0.]
 [0. 1.]]

Generate arrays using random.rand()

  • Returns an array of given shape filled with random values.
  • Syntax: np.random.rand(shape)
    • shape - integer or sequence of integer
# create an array with randomly generated 5 values 
R = np.random.rand(5)
print(R)
[0.81734336 0.92165032 0.58365094 0.22109209 0.63737478]
# generate 2x2 array of random values 
R1 = np.random.random((2, 2))
print(R1)
[[0.90979081 0.8862896 ]
 [0.03368416 0.89947562]]

more examples

# generate 4x5 array of random floats between 0-1
R2 = np.random.rand(4,5)
print(R2)
[[0.75300779 0.36344641 0.30953108 0.84695505 0.42528285]
 [0.81273348 0.82219856 0.03325799 0.44774794 0.8389878 ]
 [0.41626324 0.23559528 0.76594982 0.70132042 0.60295411]
 [0.61553901 0.78857611 0.77335201 0.16205083 0.87425881]]
# generate 6x7 array of random floats between 0-100
R3 = np.random.rand(6,7)*100
print(R3)
[[1.54884886e+01 2.22758063e+01 6.26157575e+01 3.36928618e+01
  3.92199658e+01 9.55277808e+01 9.92941007e+01]
 [8.17944585e+01 8.50172709e+01 3.50931455e+01 9.48876561e+01
  8.57677696e+01 7.11938563e+01 9.43478011e+01]
 [5.79938042e+01 8.10532498e+01 1.51207225e+01 5.52948717e+01
  4.24288778e+01 1.32168619e+01 2.38567363e+00]
 [9.24917524e-02 7.48348570e+01 8.33249428e+01 4.51109040e+01
  3.06106890e+01 9.00640398e+01 1.85077679e+01]
 [4.16978708e+01 5.36294829e+01 4.81618167e+00 4.81639838e+01
  7.12982899e+01 7.06559008e+01 1.73322906e+01]
 [6.73487371e+01 5.43246986e+00 2.99953686e+00 2.03664622e+01
  6.49524747e+01 7.07364923e+01 9.19542908e+00]]
# generate 2x3 array of random ints between 0-4
R4 = np.random.randint(5, size=(2,3))
print(R4)
[[2 0 1]
 [3 2 0]]

Generate empty arrays using empty()

  • Return a new array of given shape and type, without initializing entries.
  • Syntax: np.empty(shape, dtype)
    • shape - integer or tuple of integer
    • dtype - data-type
# generate an empty array 
E1 = np.empty(2) 
print(E1)
[1. 1.]
# 2x2 empty array
E2 = np.empty((2, 2)) 
print(E2)
[[0.90979081 0.8862896 ]
 [0.03368416 0.89947562]]

Arrays using specific data type

  • float16
  • float32
  • int8

SEE MORE - https://numpy.org/devdocs/user/basics.types.html

# generate an array of floats 
D = np.ones((2, 3, 4), dtype=np.float16)
D
array([[[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]],

       [[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]]], dtype=float16)

NumPy Data Types

Data Types in Python

  • strings - used to represent text data, the text is given under quote marks. eg. “ABCD”
  • integer - used to represent integer numbers. eg. -1, -2, -3
  • float - used to represent real numbers. eg. 1.2, 42.42
  • boolean - used to represent True or False.
  • complex - used to represent a number in complex plain. eg. 1.0 + 2.0j, 1.5 + 2.5j

Data Types in NumPy

  • i - integer
  • b - boolean
  • u - unsigned integer
  • f - float
  • c - complex float
  • m - timedelta
  • M - datetime
  • O - object
  • S - string
  • U - unicode string
  • V - fixed chunk of memory for other type ( void )

Brief Overview of NumPy Data Types

Data Type Description
bool_ Boolean (True or False) stored as a byte
int_ Default integer type (same as C long; normally either int64 or int32)
intc Identical to C int (normally int32 or int64)
intp Integer used for indexing (same as C ssize_t; normally either int32 or int64)
int8 Byte (-128 to 127)
int16 Integer (-32768 to 32767)
int32 Integer (-2147483648 to 2147483647)
int64 Integer (-9223372036854775808 to 9223372036854775807)

Data Type Description
uint8 Unsigned integer (0 to 255)
uint16 Unsigned integer (0 to 65535)
uint32 Unsigned integer (0 to 4294967295)
uint64 Unsigned integer (0 to 18446744073709551615)
float_ Shorthand for float64
float16 Half precision float: sign bit, 5 bits exponent, 10 bits mantissa
float32 Single precision float: sign bit, 8 bits exponent, 23 bits mantissa
float64 Double precision float: sign bit, 11 bits exponent, 52 bits mantissa
complex_ Shorthand for complex128.
complex64 Complex number, represented by two 32-bit floats
complex128 Complex number, represented by two 64-bit floats

Creating NumPy Array

import numpy as np 
# Create a numpy array 
A = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
print(A)
[1 2 3 4 5 6 7 8 9]
# Create a numpy array with strings 
S = np.array(["Jim", "Tim", "Mim"])
print(S)
['Jim' 'Tim' 'Mim']

Checking the Data Type of an Array

The NumPy array object has a property called dtype that returns the data type of the array:

# Check the data type of `A`
A.dtype
dtype('int32')
# Check rthe data type of `S`
S.dtype
dtype('<U3')

Creating Arrays With a Defined Data Type

We use the array() function to create arrays, this function can take an optional argument: dtype that allows us to define the expected data type of the array elements

# Create an array of integers 
B = np.array([1, 2, 3, 4, 5], dtype='int')
print(B)
[1 2 3 4 5]
# Check data type of `B`
B.dtype
dtype('int32')
# Create an array of floats 
C = np.array([1, 2, 3, 4, 5], dtype='float32')
print(C)
[1. 2. 3. 4. 5.]
# Check the data type of `C`
C.dtype
dtype('float32')

Find Byte Size of an Array

# Check byte size of `A`
A.nbytes
36
# Check byte size of `S`
S.nbytes
36
# Check byte size of `B`
B.nbytes
20
# Check byte size of `C`
C.nbytes
20

Converting Data Type on Existing Arrays using astype()

# Create an array of floats 
arr = np.array([1, 2, 3, 4, 5], dtype='float32')
print(arr)
[1. 2. 3. 4. 5.]
# Convert the array into integer using astype()
new_arr = arr.astype('int')
print(new_arr)
[1 2 3 4 5]
# Now check the data type again 
new_arr.dtype
dtype('int32')

1D Array Indexing

One-dimensional arrays can be indexed, sliced and iterated over, much like lists and other Python sequences. The (start:stop:step) notation for slicing is used.

  • 1D array at index i
  • Returns the ith element of an array
  • Syntax: array[i]
# Create an 1D arary 
A1 = np.array([11, 22, 34, 12, 15])
# Select ith element of A1 
A1[1]
22
# Negative indexing 
A1[-1]
15

2D Array Indexing

  • 2D array at index [i][j]
  • Returns the [i][j] element of an array
  • Syntax: array[i][j]
# Create an 2D array 
A2 = np.array([[0, 1, 3], [4, 6, 7]])
# Select the first row of A2 
A2[0]
array([0, 1, 3])
# Select the first element of first row 
A2[0][0]
0

Note

  • First, A2[0] = [0, 1, 3], which is the first row of array A2
  • Second, A2[0] select the first element of first row.
# Select the second row of A2 
A2[1]
array([4, 6, 7])
# Select the 3rd element of second row 
A2[1][2]
7

Consider an array students, it contains the test scores in two courses of the students against their names

students = np.array([['Alice','Beth','Cathy','Dorothy'],
                     [65,78,90,81],
                     [71,82,79,92]])
students
array([['Alice', 'Beth', 'Cathy', 'Dorothy'],
       ['65', '78', '90', '81'],
       ['71', '82', '79', '92']], dtype='<U11')
students[0]
array(['Alice', 'Beth', 'Cathy', 'Dorothy'], dtype='<U11')
students[1]
array(['65', '78', '90', '81'], dtype='<U11')
students[2]
array(['71', '82', '79', '92'], dtype='<U11')
students[0,1]
'Beth'

Array Slicing

1D Array Slicing

# Create a 1D Array 
A = np.array([11, 12, 13, 14, 15])
# Select all elements 
A[:]
array([11, 12, 13, 14, 15])
# Returns n-1 
A[1:2]
array([12])
# Select all except last element 
A[:-1]
array([11, 12, 13, 14])

2D Array slicing

This will consider the rows 0 and 1, columns 2 and 3

# Create a 2D array of students info 
students = np.array([['Alice','Beth','Cathy','Dorothy'],
                     [65,78,90,81],
                     [71,82,79,92]])
# All rows and column 1
students[:,1:2]
array([['Beth'],
       ['78'],
       ['82']], dtype='<U11')
# All rows, columns 1 and 2
students[:,1:3]
array([['Beth', 'Cathy'],
       ['78', '90'],
       ['82', '79']], dtype='<U11')
# All columns, rows 0 and 1
students[0:2,:]
array([['Alice', 'Beth', 'Cathy', 'Dorothy'],
       ['65', '78', '90', '81']], dtype='<U11')

# All rows and columns
students[:]
array([['Alice', 'Beth', 'Cathy', 'Dorothy'],
       ['65', '78', '90', '81'],
       ['71', '82', '79', '92']], dtype='<U11')
# The last row
students[-1,:]
array(['71', '82', '79', '92'], dtype='<U11')
# 3rd from last to second from last row, last two columns
students[-3:-1,-2:]
array([['Cathy', 'Dorothy'],
       ['90', '81']], dtype='<U11')

Dots or ellipsis(…)

Slicing can also include ellipsis (…) to make a selection tuple of the same length as the dimension of an array. The dots (…) represent as many colons as needed to produce a complete indexing tuple

Equivalent to students[0] or students[0:1,:]

Select row 0 and all columns

students[0,...] 
array(['Alice', 'Beth', 'Cathy', 'Dorothy'], dtype='<U11')
# All rows and column 1 
students[...,1]
array(['Beth', '78', '82'], dtype='<U11')
students[...,1].shape
(3,)
students[:,1:2].shape
(3, 1)

Fancy Indexing - Integer Arrays

NumPy arrays can be indexed with slices, but also with boolean or integer arrays (masks). It means passing an array of indices to access multiple array elements at once. This method is called fancy indexing. It creates copies not views.

a = np.arange(12)**2   
a
array([  0,   1,   4,   9,  16,  25,  36,  49,  64,  81, 100, 121])

Suppose we want to access three different elements. We could do it like this:

a[2],a[6],a[8]
(4, 36, 64)

Alternatively, we can pass a single list or array of indices to obtain the same result:

indx_1 = [2,6,8]
a[indx_1]
array([ 4, 36, 64])

When using fancy indexing, the shape of the result reflects the shape of the index arrays rather than the shape of the array being indexed

indx_2 = np.array([[2,4],[8,10]])
indx_2
array([[ 2,  4],
       [ 8, 10]])
a[indx_2]
array([[  4,  16],
       [ 64, 100]])

We can also give indexes for more than one dimension. The arrays of indices for each dimension must have the same shape.

food = np.array([["blueberry","strawberry","cherry","blackberry"],
                 ["pinenut","hazelnuts","cashewnut","coconut"],
                 ["mustard","paprika","nutmeg","clove"]])
food
array([['blueberry', 'strawberry', 'cherry', 'blackberry'],
       ['pinenut', 'hazelnuts', 'cashewnut', 'coconut'],
       ['mustard', 'paprika', 'nutmeg', 'clove']], dtype='<U10')

We will now select the corner elements of this array

row = np.array([[0,0],[2,2]])
col = np.array([[0,3],[0,3]])
food[row,col]
array([['blueberry', 'blackberry'],
       ['mustard', 'clove']], dtype='<U10')

Notice that the first value in the result is food[0,0], next is food[0,3] , food[2,0] and lastly food[2,3]

food[2,0]
'mustard'

Modifying Values with Fancy Indexing

Just as fancy indexing can be used to access parts of an array, it can also be used to modify parts of an array.

food[row,col] = "000000"
food
array([['000000', 'strawberry', 'cherry', '000000'],
       ['pinenut', 'hazelnuts', 'cashewnut', 'coconut'],
       ['000000', 'paprika', 'nutmeg', '000000']], dtype='<U10')

We can use any assignment-type operator for this. Consider following example:

a
array([  0,   1,   4,   9,  16,  25,  36,  49,  64,  81, 100, 121])
indx_1
[2, 6, 8]
a[indx_1] = 999
a
array([  0,   1, 999,   9,  16,  25, 999,  49, 999,  81, 100, 121])
a[indx_1] -=100
a
array([  0,   1, 899,   9,  16,  25, 899,  49, 899,  81, 100, 121])

Fancy Indexing - Boolean Arrays

When we index arrays with arrays of (integer) indices we are providing the list of indices to pick. With boolean indices the approach is different; we explicitly choose which items in the array we want and which ones we don’t.

Frequently this type of indexing is used to select the elements of an array that satisfy some condition

a = np.arange(16).reshape(4,4)
a
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

Now we find the elements that are greater than 9. This will return a numpy array of the same shape as our original array.

indx_bool = a > 9
indx_bool
array([[False, False, False, False],
       [False, False, False, False],
       [False, False,  True,  True],
       [ True,  True,  True,  True]])

We use this array to select elements in a corresponding to ‘true’ values in the boolean array.

a[indx_bool]
array([10, 11, 12, 13, 14, 15])

We can do all of the above in a single concise statement

print(a[a > 9])
[10 11 12 13 14 15]

Counting

How many values less than 6?

a < 6
array([[ True,  True,  True,  True],
       [ True,  True, False, False],
       [False, False, False, False],
       [False, False, False, False]])
np.count_nonzero(a < 6)
6
np.sum(a < 6)
6

How many values less than 6 in each row?

np.sum(a < 6, axis=1)
array([4, 2, 0, 0])

Are there any values greater than 8?

np.any(a > 8)
True

Are all values less than 10?

np.all(a < 10)
False

Are all values less than 100?

np.all(a < 100)
True

Are all values in each row less than 9?

np.all(a < 9, axis=1)
array([ True,  True, False, False])

Structured Arrays

Structured arrays or record arrays are useful when you perform computations, and at the same time you could keep closely related data together. Structured arrays provide efficient storage for compound, heterogeneous data.

NumPy also provides powerful capabilities to create arrays of records, as multiple data types live in one NumPy array. However, one principle in NumPy that still needs to be honored is that the data type in each field (think of this as a column in the records) needs to be homogeneous.

Imagine that we have several categories of data on a number of students say, name, roll number, and test scores.

name  = ["Alice","Beth","Cathy","Dorothy"]
studentId  = [1,2,3,4]
score = [85.4,90.4,87.66,78.9]

There’s nothing here that tells us that the three arrays are related; it would be more natural if we could use a single structure to store all of this data.

Define the np array with the names of the ‘columns’ and the data format for each

  • U10 represents a 10-character Unicode string
  • i4 is short for int32 (i for int, 4 for 4 bytes)
  • f8 is shorthand for float64
student_data = np.zeros(4, dtype={'names':('name', 'studentId', 'score'),
                          'formats':('U10', 'i4', 'f8')})

np.zeros() for a string sets it to an empty string

student_data
array([('', 0, 0.), ('', 0, 0.), ('', 0, 0.), ('', 0, 0.)],
      dtype=[('name', '<U10'), ('studentId', '<i4'), ('score', '<f8')])
print(student_data.dtype)
[('name', '<U10'), ('studentId', '<i4'), ('score', '<f8')]

Now that we’ve created an empty container array, we can fill the array with our lists of values

student_data['name'] = name
student_data['studentId'] = studentId
student_data['score'] = score
print(student_data)
[('Alice', 1, 85.4 ) ('Beth', 2, 90.4 ) ('Cathy', 3, 87.66)
 ('Dorothy', 4, 78.9 )]

The handy thing with structured arrays is that you can now refer to values either by index or by name

student_data['name']
array(['Alice', 'Beth', 'Cathy', 'Dorothy'], dtype='<U10')
student_data['studentId']
array([1, 2, 3, 4])
student_data['score']
array([85.4 , 90.4 , 87.66, 78.9 ])

If you index student_data at position 1 you get a structure:

student_data[1]
('Beth', 2, 90.4)

Get the name attribute from the last row

student_data[-1]['name']
'Dorothy'
Get names where score is above 85
student_data[student_data['score'] > 85]['name']
array(['Alice', 'Beth', 'Cathy'], dtype='<U10')

Note that if you’d like to do any operations that are any more complicated than these, you should probably consider the Pandas package with provides a powerful data structure called data frames.

ITERATIONS

1D Arrays

a = np.arange(11)**2
a
array([  0,   1,   4,   9,  16,  25,  36,  49,  64,  81, 100])
# Iteratating over an array 
for i in a:
    print(i, end=" ")
0 1 4 9 16 25 36 49 64 81 100 
# Iteratating over an array 
for i in a:
    print(i * 2, end = " ")
0 2 8 18 32 50 72 98 128 162 200 

Multi-Dimensional Arrays

Iterating over multidimensional arrays is done with respect to the first axis:

students = np.array([['Alice','Beth','Cathy','Dorothy'],
                     [65,78,90,81],
                     [71,82,79,92]])
Each iteration will be over the rows of the array
for i in students:
    print('i = ', i)
i =  ['Alice' 'Beth' 'Cathy' 'Dorothy']
i =  ['65' '78' '90' '81']
i =  ['71' '82' '79' '92']

Flatten a multi-dimensional array

If one wants to perform an operation on each element in the array, one can use the flatten function which will flatten the array to a single dimension.
By default, the flattening will occur row-wise (also knows as C order)

for element in students.flatten():
    print(element, end = " ")
Alice Beth Cathy Dorothy 65 78 90 81 71 82 79 92 
Fortran order flattening

To flatten a 2D array column-wise, use the Fortran order

for element in students.flatten(order='F'):
    print(element, end = " ")
Alice 65 71 Beth 78 82 Cathy 90 79 Dorothy 81 92 

nditer

Efficient multi-dimensional iterator object to iterate over arrays

x = np.arange(12).reshape(3,4)
x
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

Default iteration behavior is C order

This is row-wise iteration, similar to iterating over a C-order flattened array

for i in np.nditer(x):
    print(i, end = " ")
0 1 2 3 4 5 6 7 8 9 10 11 
Fortran order

This is like iterating over an array which has been flattened column-wise

for i in np.nditer(x, order = 'F'): 
    print(i, end = " ")
0 4 8 1 5 9 2 6 10 3 7 11 

Flags

There are a number of flags which we can pass as a list to nditer. Many of these involve setting buffering options
If we want iterate over each column, we can use the flag argument with value ‘external_loop’

for i in np.nditer(x, order = 'F', flags = ['external_loop']): 
    print(i)
[0 4 8]
[1 5 9]
[ 2  6 10]
[ 3  7 11]

Modifying Array Values

By default, the nditer treats the input array as a read-only object. To modify the array elements, you must specify either read-write or write-only mode. This is controlled with per-operand flags.

Writing on a read-only array results in an error
for arr in np.nditer(x):
    arr[...] = arr * arr
ValueError: assignment destination is read-only
We set the ops_flag to make the array read-write
for arr in np.nditer(x, op_flags = ['readwrite']):
    arr[...] = arr * arr
x
array([[  0,   1,   4,   9],
       [ 16,  25,  36,  49],
       [ 64,  81, 100, 121]])

Array Operations

Copy

  • Copies array to new memory
  • Syntax: np.copy(array)
# create an array `A1`
A1 = np.arange(10)
print(A1)
[0 1 2 3 4 5 6 7 8 9]
# copy `A1` into A2 
A2 = np.copy(A1)
print(A2)
[0 1 2 3 4 5 6 7 8 9]

View

  • Creates view of array elements with type(dtype)
  • Syntax: array.view(np.dtype)
# view of array A2 
A3 = A2.view(np.float16)
print(A3)
[0.0e+00 0.0e+00 6.0e-08 0.0e+00 1.2e-07 0.0e+00 1.8e-07 0.0e+00 2.4e-07
 0.0e+00 3.0e-07 0.0e+00 3.6e-07 0.0e+00 4.2e-07 0.0e+00 4.8e-07 0.0e+00
 5.4e-07 0.0e+00]

Sorting

  • Returns a sorted copy of an array.
  • Syntax: array.sort()
    • element-wise sorting(default)
    • axis = 0; row
    • axis = 1; column Axis
# Unsorted array
A4 = np.array([9, 2, 3,1, 5, 10])
print(A4) 
[ 9  2  3  1  5 10]
# Call sort function
A4.sort()
print(A4)
[ 1  2  3  5  9 10]

MORE EXAMPLES

# Row and column unsorted
A5 = np.array([[4, 1, 3], [9, 5, 8]])
print(A5) 
[[4 1 3]
 [9 5 8]]
A5[0]
array([4, 1, 3])
A5[0][1]
1
A5[1]
array([9, 5, 8])
A5[1][2]
8

SORT AXIS WISE

# Apply sort function on column axis=1
A5 = np.array([[4, 1, 3], [9, 5, 8]])
np.sort(A5,axis=1)
array([[1, 3, 4],
       [5, 8, 9]])
# Apply sort function on row axis=0
A5 = np.array([[4, 1, 3], [9, 5, 8]])
np.sort(A5,axis=0)
array([[4, 1, 3],
       [9, 5, 8]])

Flatten: Flattens 2D array to 1D array

A6 = np.array([[4, 1, 3], [9, 5, 8]])
A6 
array([[4, 1, 3],
       [9, 5, 8]])
# 2D array
A6 = np.array([[4, 1, 3], [9, 5, 8]])
# 1D array 
A6.flatten()
array([4, 1, 3, 9, 5, 8])

Transpose: Transposes array (rows become columns and vice versa)

A7 = np.array([[4, 1, 3], [9, 5, 8]])
A7
array([[4, 1, 3],
       [9, 5, 8]])
# Transpose A7 
A7.T
array([[4, 9],
       [1, 5],
       [3, 8]])

Reshape: Reshapes arr to r rows, c columns without changing data

Reshape

A8 = np.array([(8,9,10),(11,12,13)])
A8
array([[ 8,  9, 10],
       [11, 12, 13]])
# Reshape --> 3x4
A8.reshape(3,2)
array([[ 8,  9],
       [10, 11],
       [12, 13]])

Resize: Changes arr shape to rxc and fills new values with 0

A9 = np.array([(8,9,10),(11,12,13)])
A9
array([[ 8,  9, 10],
       [11, 12, 13]])
# Resize 
A9.resize(3, 2)
A9
array([[ 8,  9],
       [10, 11],
       [12, 13]])

Array Shape Manipulation

import numpy as np

1. Flattening

a = np.array([("Germany","France", "Hungary","Austria"),
              ("Berlin","Paris", "Budapest","Vienna" )]) 
a
array([['Germany', 'France', 'Hungary', 'Austria'],
       ['Berlin', 'Paris', 'Budapest', 'Vienna']], dtype='<U8')
a.shape
(2, 4)

The ravel() function

The primary functional difference is that flatten is a method of an ndarray object and hence can only be called for true numpy arrays. In contrast ravel() is a library-level function and hence can be called on any object that can successfully be parsed. For example ravel() will work on a list of ndarrays, while flatten will not.

a.ravel()
array(['Germany', 'France', 'Hungary', 'Austria', 'Berlin', 'Paris',
       'Budapest', 'Vienna'], dtype='<U8')

T gives transpose of an array

a.T   
array([['Germany', 'Berlin'],
       ['France', 'Paris'],
       ['Hungary', 'Budapest'],
       ['Austria', 'Vienna']], dtype='<U8')
a.T.ravel()
array(['Germany', 'Berlin', 'France', 'Paris', 'Hungary', 'Budapest',
       'Austria', 'Vienna'], dtype='<U8')

2. Reshaping

reshape() gives a new shape to an array without changing its data.

a.shape
(2, 4)
a.reshape(4,2)
array([['Germany', 'France'],
       ['Hungary', 'Austria'],
       ['Berlin', 'Paris'],
       ['Budapest', 'Vienna']], dtype='<U8')
np.arange(15).reshape(3,5)
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
np.arange(15).reshape(5,3)
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

The reshape() dimensions needs to match the number of values in the array

Reshaping a 15-element array to an 18-element one will throw an error

np.arange(15).reshape(3,6)
ValueError: cannot reshape array of size 15 into shape (3,6)

Specify only one dimension (and infer the others) when reshaping

Another way we can reshape is by metioning only one dimension, and -1. -1 means that the length in that dimension is inferred

countries = np.array(["Germany","France", "Hungary","Austria","Italy","Denmark"])
countries
array(['Germany', 'France', 'Hungary', 'Austria', 'Italy', 'Denmark'],
      dtype='<U7')
Here the unspecified value is inferred to be 2
countries.reshape(-1,3) 
array([['Germany', 'France', 'Hungary'],
       ['Austria', 'Italy', 'Denmark']], dtype='<U7')
Here the unspecified value is inferred to be 3
countries.reshape(3,-1) 
array([['Germany', 'France'],
       ['Hungary', 'Austria'],
       ['Italy', 'Denmark']], dtype='<U7')

If the values of the dimensions are not factors of the number of elements, there will be an error

countries.reshape(4,-1)
ValueError: cannot reshape array of size 6 into shape (4,newaxis)

Arithmetic Operations

If the dimensions of two arrays are dissimilar, element-to-element operations are not possible. However, operations on arrays of non-similar shapes is still possible in NumPy, because of the broadcasting capability. We will see what broadcasting is in the upcoming lessons.

a = np.array([10,10,10])
b = np.array([5,5,5])
a + b
array([15, 15, 15])
a - b
array([5, 5, 5])
a * b
array([50, 50, 50])
a / b
array([2., 2., 2.])

a % 3  
array([1, 1, 1], dtype=int32)
a < 35
array([ True,  True,  True])
a > 25
array([False, False, False])
a ** 2
array([100, 100, 100])

dot function or method

A = np.array( [[1,1],[0,1]] )
B = np.array( [[2,0], [3,4]] )

print('A:\n', A)
print('B:\n', B)
A:
 [[1 1]
 [0 1]]
B:
 [[2 0]
 [3 4]]

This gives element-wise multiplication

A * B
array([[2, 0],
       [0, 4]])

This gives the matrix multiplication

A.dot(B)
array([[5, 4],
       [3, 4]])
np.dot(A,B)
array([[5, 4],
       [3, 4]])

Modifying an existing array rather than create a new one

a  *= 3
a
array([30, 30, 30])
b += a
b
array([35, 35, 35])

Unary Operators

ages = np.array([12,15,18,20])
ages.sum()
65
ages.min()
12
ages.max()
20

By default, these operations apply to the array as though it were a list of numbers, regardless of its shape. However, by specifying the axis parameter you can apply an operation along the specified axis of an array

numbers = np.arange(12).reshape(3,4)
numbers
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

Row and column operations

In a 2D array axis #0 represents columns. Axis #1 refers to rows

Sum up each column

numbers.sum(axis=0) 
array([12, 15, 18, 21])

Sum up each row

numbers.sum(axis=1)
array([ 6, 22, 38])

Minimum of each row

numbers.min(axis=1) 
array([0, 4, 8])

Splitting Arrays

1.split

Split an array into multiple sub-arrays. By specifying the number of equally shaped arrays to return, or by specifying the columns after which the division should occur

split(array, indices_or_sections, axis=0)

x = np.arange(9)
x
array([0, 1, 2, 3, 4, 5, 6, 7, 8])
print('Split the array in 3 equal-sized subarrays:' )
np.split(x, 3)
Split the array in 3 equal-sized subarrays:
[array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]

The number of splits must be a divisor of the number of elements

Or Numpy will complain that an even split is not possible np.split(x, 4)

print('Split the array at positions indicated in 1-D array:' )
np.split(x,[4,7])
Split the array at positions indicated in 1-D array:
[array([0, 1, 2, 3]), array([4, 5, 6]), array([7, 8])]

2.hsplit

The numpy.hsplit is a special case of split() function where axis is 1 indicating a horizontal split regardless of the dimension of the input array.
In this example, the split will be performed along a column

y = np.array([("Germany","France", "Hungary","Austria"),
              ("Berlin","Paris", "Budapest","Vienna" )]) 
y
array([['Germany', 'France', 'Hungary', 'Austria'],
       ['Berlin', 'Paris', 'Budapest', 'Vienna']], dtype='<U8')
p1, p2 = np.hsplit(y, 2)
print(p1)
[['Germany' 'France']
 ['Berlin' 'Paris']]
print(p2)
[['Hungary' 'Austria']
 ['Budapest' 'Vienna']]
np.hsplit(y,4)
[array([['Germany'],
        ['Berlin']], dtype='<U8'),
 array([['France'],
        ['Paris']], dtype='<U8'),
 array([['Hungary'],
        ['Budapest']], dtype='<U8'),
 array([['Austria'],
        ['Vienna']], dtype='<U8')]

3.vsplit

vsplit splits along the vertical axis

p_1,p_2 = np.vsplit(y, 2)
print(p_1)
[['Germany' 'France' 'Hungary' 'Austria']]
print(p_2)
[['Berlin' 'Paris' 'Budapest' 'Vienna']]

Array Unpacking

An alternative approach is array unpacking. In this example, we unpack the array into two variables. The array unpacks by row i.e Unpacking “unpacks” the first dimensions of an array

countries,capitals = y
print('Countries: ')
print(countries)
print('Capitals: ')
print(capitals)
Countries: 
['Germany' 'France' 'Hungary' 'Austria']
Capitals: 
['Berlin' 'Paris' 'Budapest' 'Vienna']

To get the columns, just transpose the array.

b1,b2,b3,b4 = y.T
print("b1: ")
print(b1)
print("b2: ")
print(b2)
print("b3: ")
print(b3)
print("b4: ")
print(b4)
b1: 
['Germany' 'Berlin']
b2: 
['France' 'Paris']
b3: 
['Hungary' 'Budapest']
b4: 
['Austria' 'Vienna']

View vs Copy

  • When the contents are physically stored in another location, it is called Copy.

  • On the other hand, a different view of the same memory content is provided, we call it as View.

  • Different array objects can share the same data. NumPy has ndarray.view() method which is a new array object that looks at the same data of the original array.

  • Here, change in dimensions of the new array doesn’t change dimensions of the original.

import numpy as np
fruits = np.array(["Apple","Mango","Grapes","Watermelon"])

We will create basket now as a view of fruits

basket_1 = fruits.view()
basket_2 = fruits.view()
print(basket_1)
print(basket_2)
['Apple' 'Mango' 'Grapes' 'Watermelon']
['Apple' 'Mango' 'Grapes' 'Watermelon']

MEMORY LOCATION

print("ids for the arrays are different.")
print("id for fruits is : ")
print(id(fruits))
print("id for baskets is : ")
print(id(basket_1))
print(id(basket_2))
ids for the arrays are different.
id for fruits is : 
2630436222704
id for baskets is : 
2630436223760
2630436223568
basket_1 is fruits
False
baskets is a view of the data owned by fruits
basket_1.base is fruits 
True

Change a few elements of basket. It changes the elements of fruits

Here, we assign a new value to the first element of basket_2. You might be astonished that the list of fruits has been “automatically” changed as well. The explanation is that there has been no new assignment to basket_2, only to one of its elements.

basket_2[0] = "Strawberry"
basket_2
array(['Strawberry', 'Mango', 'Grapes', 'Watermelon'], dtype='<U10')
fruits
array(['Strawberry', 'Mango', 'Grapes', 'Watermelon'], dtype='<U10')
And this also affects basket_1
basket_1
array(['Strawberry', 'Mango', 'Grapes', 'Watermelon'], dtype='<U10')

Change the entire elements of basket. It does not change fruits

basket_1 = np.array(["Peach","Pineapple","Banana","Orange"])
basket_1
array(['Peach', 'Pineapple', 'Banana', 'Orange'], dtype='<U9')
fruits
array(['Strawberry', 'Mango', 'Grapes', 'Watermelon'], dtype='<U10')

In this case, a new memory location had been allocated for basket_1, because we have assigned a complete new list to this variable

Change the shape of basket. It does not change the shape of fruits

basket_2.shape = 2,2
print("basket_2: ")
print(basket_2)
basket_2: 
[['Strawberry' 'Mango']
 ['Grapes' 'Watermelon']]
print("Shape of fruits: ")
print(fruits)
Shape of fruits: 
['Strawberry' 'Mango' 'Grapes' 'Watermelon']

Slicing an array returns a view of it

mini_basket = fruits[2:]
mini_basket
array(['Grapes', 'Watermelon'], dtype='<U10')
fruits[3] = "Peach"
mini_basket
array(['Grapes', 'Peach'], dtype='<U10')

Deep Copy

The copy() method makes a complete copy of the array and its data, and doesn’t share with the original array.

import numpy as np
fruits = np.array(["Apple","Mango","Grapes","Watermelon"])

We now Create a deep copy of fruits

basket = fruits.copy()
basket
array(['Apple', 'Mango', 'Grapes', 'Watermelon'], dtype='<U10')
basket is fruits
False
basket.base is fruits  # basket doesn't share anything with fruits
False

Change contents or shape of bakset. It does not change the contents of fruits

basket [0] = "Strawberry"
basket
array(['Strawberry', 'Mango', 'Grapes', 'Watermelon'], dtype='<U10')
fruits
array(['Apple', 'Mango', 'Grapes', 'Watermelon'], dtype='<U10')
basket.shape = 2,2
print("Shape of basket: ")
print(basket)
Shape of basket: 
[['Strawberry' 'Mango']
 ['Grapes' 'Watermelon']]
print("Shape of fruits: ")
print(fruits)
Shape of fruits: 
['Apple' 'Mango' 'Grapes' 'Watermelon']

Broadcasting

  • Broadcasting is a powerful mechanism that allows numpy to work with arrays of different shapes when performing arithmetic operations.

  • Broadcasting solves the problem of arithmetic between arrays of differing shapes by in effect replicating the smaller array along the last mismatched dimension.

  • NumPy operations are usually done on pairs of arrays on an element-by-element basis. In the simplest case, the two arrays must have exactly the same shape, as in the following example:

a = np.array([1,2,3,4,5])
b = np.array([10,10,10,10,10])
a * b
array([10, 20, 30, 40, 50])

If the dimensions of two arrays are dissimilar, element-to-element operations are not possible. However, operations on arrays of non-similar shapes is still possible in NumPy, because of the broadcasting capability. The smaller array is broadcast to the size of the larger array so that they have compatible shapes.

Scalar and One-Dimensional Array

A single value or scalar can be used in arithmetic with a one-dimensional array.

c = 10
a * c
array([10, 20, 30, 40, 50])

The result is equivalent to the previous example where b was an array. We can think of the scalar c being stretched during the arithmetic operation into an array with the same shape as a. The new elements in c are simply copies of the original scalar.

Scalar and n-Dimensional Array

A scalar value can be used in arithmetic with a n-dimensional array.

one = np.ones((4,3))
one * c
array([[10., 10., 10.],
       [10., 10., 10.],
       [10., 10., 10.],
       [10., 10., 10.]])

One-Dimensional and n-Dimensional Arrays

A one-dimensional array can be used in arithmetic with a n-dimensional array.

Consider the following example: We have the heights (in cm) and weights ( in pounds) of a group of students. We store this information in an array called student_bio. * Heights are in cm * Weights are in kgs

heights  = [165,170,168,183,172,169]
weights  = [61,76,56,81,62,60]
student_bio = np.array([heights,weights])
student_bio
array([[165, 170, 168, 183, 172, 169],
       [ 61,  76,  56,  81,  62,  60]])

Now, we would like to convert heights into feet and weights into kilograms, for that the conversion factors are 0.0328084 and 2.20462 respectively

factor_1 = np.array([0.0328084,2.20462 ])
factor_1
array([0.0328084, 2.20462  ])
factor_1.shape
(2,)
student_bio.shape
(2, 6)

General Broadcasting Rules

When operating on two arrays, NumPy compares their shapes element-wise. The dimensions are considered in reverse order, starting with the trailing dimensions, and working its way forward. Two dimensions are compatible when

  1. they are equal,
  2. one of them is of size 1

Shape mismatch

This fails because there is a mismatch in the trailing dimensions
student bio: 2 x 6
factor_1:            2

The trailing dimensions here are 6 and 2, so there is a mismatch

student_bio * factor_1
ValueError: operands could not be broadcast together with shapes (2,6) (2,) 
factor_2 = np.array([[0.0328084],[2.20462 ]])
factor_2
array([[0.0328084],
       [2.20462  ]])
factor_2.shape
(2, 1)

Dimensions match

The dimensions are:
2 x 6
2 x 1

Here, the last dimensions are 6,1 so they match on account of one of them being 1
The next dimensions are 2,2 so there is a match

student_bio * factor_2
array([[  5.413386 ,   5.577428 ,   5.5118112,   6.0039372,   5.6430448,
          5.5446196],
       [134.48182  , 167.55112  , 123.45872  , 178.57422  , 136.68644  ,
        132.2772   ]])

Why did we encounter an error for the first try?

Broadcasting is possible only when certain rules are satisfied, it does not work for all cases, and imposes a strict rule that must be satisfied for broadcasting to be performed.

The dimensions with size 1 are stretched or “copied” to match the other.

After application of the broadcasting rules, the sizes of all arrays must match.

In the above example, factor is stretched to match with the dimensions of students_bio in order to carry out operations.

Vector Stacking

import numpy as np

1. concatenate

The arrays must have the same shape, except in the dimension corresponding to axis. The default axis along which the arrays will be joined is 0.

x = np.array([["Germany","France"],["Berlin","Paris"]])
y = np.array([["Hungary","Austria"],["Budapest","Vienna"]])
print(x)
print(x.shape)
[['Germany' 'France']
 ['Berlin' 'Paris']]
(2, 2)
print(y)
print(y.shape)
[['Hungary' 'Austria']
 ['Budapest' 'Vienna']]
(2, 2)

The default is row-wise concatenation for a 2D array

print('Joining two arrays along axis 0')
np.concatenate((x,y))
Joining two arrays along axis 0
array([['Germany', 'France'],
       ['Berlin', 'Paris'],
       ['Hungary', 'Austria'],
       ['Budapest', 'Vienna']], dtype='<U8')
Column-wise
print('Joining two arrays along axis 1')
np.concatenate((x,y), axis = 1)
Joining two arrays along axis 1
array([['Germany', 'France', 'Hungary', 'Austria'],
       ['Berlin', 'Paris', 'Budapest', 'Vienna']], dtype='<U8')

2. stack

a = np.array([1, 2, 3])
b = np.array([2, 3, 4])
np.stack((a, b))
array([[1, 2, 3],
       [2, 3, 4]])
studentId = np.array([1,2,3,4])
name   = np.array(["Alice","Beth","Cathy","Dorothy"])
scores  = np.array([65,78,90,81])
np.stack((studentId, name, scores))
array([['1', '2', '3', '4'],
       ['Alice', 'Beth', 'Cathy', 'Dorothy'],
       ['65', '78', '90', '81']], dtype='<U11')
np.stack((studentId, name, scores)).shape
(3, 4)
np.stack((studentId, name, scores), axis =1)
array([['1', 'Alice', '65'],
       ['2', 'Beth', '78'],
       ['3', 'Cathy', '90'],
       ['4', 'Dorothy', '81']], dtype='<U11')
np.stack((studentId, name, scores), axis =1).shape
(4, 3)

3. vstack

Stacks row wise

np.vstack((studentId, name, scores)) 
array([['1', '2', '3', '4'],
       ['Alice', 'Beth', 'Cathy', 'Dorothy'],
       ['65', '78', '90', '81']], dtype='<U11')

4. hstack

Stacks column wise

np.hstack((studentId, name, scores)) 
array(['1', '2', '3', '4', 'Alice', 'Beth', 'Cathy', 'Dorothy', '65',
       '78', '90', '81'], dtype='<U11')
np.hstack((studentId, name, scores)).shape
(12,)

The functions concatenate, stack and block provide more general stacking and concatenation operations.

Universal Functions

NumPy provides standard trigonometric functions, functions for arithmetic operations, handling complex numbers, statistical functions,etc. In NumPy, these are called “universal functions”(ufunc).

import numpy as np

Trigonometric Functions

angles = np.array([0,30,45,60,90]) 

Angles need to be converted to radians by multiplying by pi/180

Only then can we appy trigonometric functions to our array

angles_radians = angles * np.pi/180
angles_radians
array([0.        , 0.52359878, 0.78539816, 1.04719755, 1.57079633])
print('Sine of angles in the array:')
print(np.sin(angles_radians))      
Sine of angles in the array:
[0.         0.5        0.70710678 0.8660254  1.        ]

Alternatively, use the np.radians() function to convert to radians

angles_radians = np.radians(angles)
angles_radians
array([0.        , 0.52359878, 0.78539816, 1.04719755, 1.57079633])
print('Cosine of angles in the array:')
print(np.cos(angles_radians))
Cosine of angles in the array:
[1.00000000e+00 8.66025404e-01 7.07106781e-01 5.00000000e-01
 6.12323400e-17]
print('Tangent of angles in the array:')
print(np.tan(angles_radians))
Tangent of angles in the array:
[0.00000000e+00 5.77350269e-01 1.00000000e+00 1.73205081e+00
 1.63312394e+16]

MATH

arcsin, arcos, and arctan functions return the trigonometric inverse of sin, cos, and tan of the given angle. The result of these functions can be verified by numpy.degrees() function by converting radians to degrees.

sin = np.sin(angles * np.pi/180) 
print ('Compute sine inverse of angles. Returned values are in radians.')

inv = np.arcsin(sin) 
print (inv) 
Compute sine inverse of angles. Returned values are in radians.
[0.         0.52359878 0.78539816 1.04719755 1.57079633]

np.degrees() converts radians to degrees

print ('Check result by converting to degrees:' )
print (np.degrees(inv)) 
Check result by converting to degrees:
[ 0. 30. 45. 60. 90.]

Statistical Functions

test_scores = np.array([32.32, 56.98, 21.52, 44.32, 
                        55.63, 13.75, 43.47, 43.34])
print('Mean test scores of the students: ')
print(np.mean(test_scores))
Mean test scores of the students: 
38.91625
print('Median test scores of the students: ')
print(np.median(test_scores))
Median test scores of the students: 
43.405

We will now perform basic statistical methods on real life dataset. We will use salary data of 1147 European developers.

salaries = np.genfromtxt('salary.csv', 
                         delimiter=',')
salaries
array([60000., 58000., 56967., ..., 54647., 25000., 70000.])
salaries.shape
(1147,)

METRICS

mean     = np.mean(salaries)
median   = np.median(salaries)
sd       = np.std(salaries)
variance = np.var(salaries)
print('Mean = %i' %mean)
print('Median = %i' %median)
print('Standard Deviation = %i' %sd)
print('Variance = %i' %variance)
Mean = 55894
Median = 48000
Standard Deviation = 55170
Variance = 3043770333

MISC FUNCTIONS

argmax

Returns the indices of the maximum values along an axis.

np.argmax(salaries)
246
salaries[246]
850000.0

argmin

Returns the indices of the minimum values along an axis.

np.argmin(salaries)
282
salaries[282]
11400.0

argsort

Returns the indices that would sort an array.

np.argsort(salaries)
array([282, 969, 606, ..., 829, 389, 246], dtype=int64)

Specify a sorting algorithm

Perform an indirect sort along the given axis using the algorithm specified by the kind keyword. It returns an array of indices of the same shape as a that index data along the given axis in sorted order.

np.argsort(salaries,kind='mergesort')
array([282, 969, 606, ..., 829, 389, 246], dtype=int64)
View the sorted salary values
salaries[np.argsort(salaries,kind='mergesort')]
array([ 11400.,  11642.,  12000., ..., 680000., 820000., 850000.])

The functions max , min , sort gives the respective elements itself instead of indices.

where

The where() function returns the indices of elements in an input array where the given condition is satisfied.

greater_than_100 = np.where(salaries > 100000)
greater_than_100
(array([  22,   27,   33,   45,   48,   59,   77,   78,   87,   91,   94,
         102,  109,  117,  123,  151,  221,  236,  242,  243,  246,  271,
         296,  297,  303,  337,  343,  378,  385,  389,  402,  408,  432,
         436,  450,  454,  503,  504,  538,  560,  563,  580,  581,  602,
         607,  687,  701,  728,  736,  745,  769,  779,  802,  819,  822,
         829,  859,  913,  922,  946,  949,  977,  980,  998, 1019, 1033,
        1062, 1065, 1104, 1105, 1112, 1130], dtype=int64),)
len(greater_than_100[0])
72
salaries[greater_than_100]
array([261546., 500000., 120000., 210000., 105227., 115000., 130773.,
       103529., 119875., 132000., 141671., 109100., 550000., 180000.,
       220000., 109294., 135380., 103529., 120000., 185000., 850000.,
       181339., 111600., 102000., 625000., 110464., 102000., 140000.,
       120000., 820000., 250000., 130000., 136375., 141000., 142800.,
       116626., 110000., 130000., 444000., 116000., 110000., 113996.,
       160000., 150000., 108000., 120000., 120000., 220000., 117701.,
       120000., 120010., 110000., 103645., 102000., 108977., 680000.,
       152568., 114800., 150000., 111841., 110000., 108000., 102000.,
       134449., 120000., 101349., 120000., 250000., 175000., 261546.,
       120000., 103000.])

extract

The extract() function returns the elements satisfying any condition

Start off by defining a condition
condition = salaries > np.mean(salaries)
Use the condition in the extract function
np.extract(condition, salaries)
array([ 60000.,  58000.,  56967.,  70000.,  75000.,  62000.,  56000.,
       261546.,  77000., 500000.,  77570.,  60000.,  65000.,  60000.,
       120000.,  81000.,  75000.,  75000.,  73000.,  75000., 210000.,
        90000., 105227.,  84000.,  98000.,  70000., 115000.,  72000.,
        57400.,  70000.,  71796., 130773., 103529.,  57000.,  70000.,
        75000.,  57000., 119875.,  60000., 132000., 100000., 141671.,
        87500.,  90000.,  65000., 109100.,  98080.,  87689., 550000.,
        56697.,  60000., 180000.,  92000.,  63000., 220000.,  66000.,
        80000.,  65460.,  65000., 109294.,  70000.,  57758.,  75000.,
        75000.,  62662.,  64000.,  90000.,  60000.,  60000.,  70000.,
        63000.,  80000.,  70000.,  60000.,  90000., 135380.,  65000.,
        70000., 103529.,  64000.,  70000., 120000., 185000.,  70000.,
       850000.,  61500.,  60000.,  63000., 181339.,  75665.,  58000.,
        57000., 100000.,  65000.,  62662.,  90000.,  70835.,  75000.,
       111600., 102000.,  60000.,  65000., 625000., 100000.,  78000.,
        75000.,  60000.,  64000.,  63000.,  59656.,  69000.,  80750.,
        84703., 110464., 102000.,  56732.,  63000.,  62000.,  60000.,
        70151.,  60000.,  56400.,  65000., 140000.,  75000.,  65000.,
       120000., 820000.,  80000., 250000.,  58000., 130000.,  85000.,
        60000.,  65000.,  71925., 136375.,  90000., 141000.,  90000.,
        65000.,  66000.,  72000.,  65000., 142800.,  68000., 116626.,
        65000.,  56000.,  57758.,  58000.,  60000.,  60000.,  97000.,
        67382.,  90000.,  85000.,  75000.,  57000.,  60000.,  76600.,
        60000., 110000., 130000.,  61398.,  60000.,  65000.,  70000.,
       100000.,  59000.,  57000.,  60000.,  76000.,  60000.,  88000.,
       444000.,  59000.,  70000.,  72000., 100000.,  70000., 116000.,
        65000., 110000.,  70000.,  82000.,  56000.,  60000.,  72000.,
        60000.,  70000., 113996., 160000.,  60000.,  81733.,  66000.,
        80000.,  66000., 150000.,  56000., 108000.,  62031., 100000.,
        60000.,  68000.,  60000.,  60000.,  60000.,  90000.,  61700.,
        75000.,  77000.,  96000.,  60000.,  86000.,  68000.,  63000.,
        65000.,  60000.,  90000.,  90000.,  80000.,  63000.,  80000.,
        70915., 120000.,  73500.,  60000.,  60000., 120000.,  56000.,
        84396.,  95000.,  94704.,  71000.,  70000.,  60000., 220000.,
       117701.,  57758.,  80000.,  65000.,  72000.,  70000., 120000.,
        60000.,  60000.,  63500.,  60000.,  59640.,  61027.,  70000.,
        61800., 120010.,  83500.,  75000.,  72000.,  84000., 110000.,
        60005.,  59500.,  70518.,  66000.,  58000.,  70000.,  99500.,
        66858., 103645.,  77000.,  56000.,  62000.,  60000.,  57400.,
        92735.,  58000.,  69000., 102000.,  68000., 108977.,  75653.,
        68000.,  59938., 680000.,  56967.,  57600.,  75000.,  60000.,
        60000.,  58000., 152568.,  98324.,  65000.,  63605.,  82000.,
        57000.,  62000.,  85000.,  60000.,  62500.,  68656.,  87280.,
        60000.,  57400.,  76000.,  63207.,  59000., 114800.,  60000.,
        72000., 150000.,  70000.,  60000.,  57000.,  70000.,  59938.,
       111841., 110000.,  63000.,  63207.,  59000.,  64000.,  58000.,
        75000.,  75000.,  70500.,  61000.,  60555., 108000.,  60000.,
        80000., 102000., 100000.,  64800.,  62000., 134449.,  75000.,
        70000.,  65000.,  94000.,  65000.,  60000., 120000.,  80000.,
        95000.,  65000., 101349.,  74000.,  70000.,  76289.,  95000.,
        78000.,  72000.,  92000.,  65000., 120000.,  72000., 250000.,
        68500.,  62000.,  81000.,  70000.,  84000.,  56000.,  87689.,
       175000., 261546.,  59000.,  56000., 120000.,  82000.,  70000.,
        71380.,  62000., 100000.,  80000., 103000.,  57000.,  76800.,
        75000.,  90000.,  70000.])

Vectors and Matrices

Vectors

  • A vector has magnitude (size) and direction
  • Use NumPy to create a one-dimensional array
  • Vector can be created as row or column using NumPy

vectors

See More * https://www.mathsisfun.com/algebra/vectors.html * https://en.wikipedia.org/wiki/Euclidean_vector

VECTOR CREATION

# Load NumPy Library 
import numpy as np 
# Create a vector as row 
vector_row = np.array([1, 2, 3])
print(vector_row)
[1 2 3]
# Create a vector as column 
vector_column = np.array([[1], [2], [3]]) 
print(vector_column) 
[[1]
 [2]
 [3]]

Matrix

  • In mathematics, a matrix is a rectangular array of numbers, symbols, or expressions, arranged in rows and columns
  • Rows tun horizontally and columns run vertically
  • Use NumPy to create a two-dimensional array

matrix

Matrix Order

  • You can think of an r x c matrix as a set of r row vectors, each having c elements; or you can think of it as a set of c column vectors, each having r elements.

  • The rank of a matrix is defined as (a) the maximum number of linearly independent column vectors in the matrix or (b) the maximum number of linearly independent row vectors in the matrix. Both definitions are equivalent.

  • If r is less than c, then the maximum rank of the matrix is r.

  • If r is greater than c, then the maximum rank of the matrix is c. matrix

  • wikipedia

  • wolfram

  • stattrek

Create a matrix using matrix()

  • Returns a matrix from an array type object ir string of data.
  • Syntax: np.matrix(data)
mat1 = np.matrix("1, 2, 3, 4; 4, 5, 6, 7; 7, 8, 9, 10")
print(mat1)
[[ 1  2  3  4]
 [ 4  5  6  7]
 [ 7  8  9 10]]

Create a using array()

  • Returns a matrix
  • Syntax: np.array(object)
mat2 = np.array([[1, 2], [3,4], [4, 6]])
print(mat2) 
[[1 2]
 [3 4]
 [4 6]]

Matrix Properties

Shape

  • Returns number of rows and columns from a matrix
  • Syntax: mat.shape
    • shape[0] - returns the number of rows
    • shape[1] - returns the number of columns
mat3 = np.matrix("1, 2, 3, 4; 4, 5, 6, 7; 7, 8, 9, 10")
# shape 
mat3.shape
(3, 4)
# rows 
mat3.shape[0]
3
# columns 
mat3.shape[1]
4

Size

  • Returns the number of elements from a matrix
  • Syntax: array.size
mat4 = np.matrix("1, 2, 3, 4; 4, 5, 6, 7; 7, 8, 9, 10")
# size 
mat4.size
12

Modifying matrix using insert()

  • Adds values at a given position and axis in a matrix
  • Syntax: np.insert(matrix, object, values, axis)
    • matrix - input matrix
    • object - index position
    • values - matrix of values to be inserted
mat5 = np.matrix("1, 2, 3, 4; 4, 5, 6, 7; 7, 8, 9, 10")
print(mat5)
[[ 1  2  3  4]
 [ 4  5  6  7]
 [ 7  8  9 10]]
# adding a new matrix `col_new` as a new column to mat5
col_new = np.matrix("1, 1, 1")
print(col_new)
[[1 1 1]]

INSERTION

# insert at column 
mat6 = np.insert(mat5, 0, col_new, axis=1)
print(mat6) 
[[ 1  1  2  3  4]
 [ 1  4  5  6  7]
 [ 1  7  8  9 10]]
# adding a new matrix `row_new` as a new row to mat5
row_new = np.matrix("0, 0, 0, 0")
print(row_new)
[[0 0 0 0]]
# insert at row 
mat7 = np.insert(mat5, 0, row_new, axis=0)
print(mat7)
[[ 0  0  0  0]
 [ 1  2  3  4]
 [ 4  5  6  7]
 [ 7  8  9 10]]

Modifying matrix using index

  • Elements of matrix can be modified using index number
  • Syntax:: mat[row_index, col_index)
mat_a = np.matrix("1, 2, 3, 4, 5; 5, 6, 7, 8, 9; 9, 10, 11, 12, 13")
print(mat_a)
[[ 1  2  3  4  5]
 [ 5  6  7  8  9]
 [ 9 10 11 12 13]]
# change 6 with 0 
mat_a[1, 1] = 0 
# show mat_a 
print(mat_a)
[[ 1  2  3  4  5]
 [ 5  0  7  8  9]
 [ 9 10 11 12 13]]

EXTRACTION

# extract 2nd row 
mat_a[1, :]
matrix([[5, 0, 7, 8, 9]])
# extract 3rd column
mat_a[:, 2]
matrix([[ 3],
        [ 7],
        [11]])
# extract elements 
mat_a[1, 2]
7

Matrix Operations

A = np.arange(0, 20).reshape(5,4)
print(A)
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]]
B = np.arange(20, 40).reshape(5,4)
print(B)
[[20 21 22 23]
 [24 25 26 27]
 [28 29 30 31]
 [32 33 34 35]
 [36 37 38 39]]

Addition

  • np.add()- performs element-wise addition between two matrices
  • Syntax: np.add(matrix_1, matrix_2) add
# addition 
np.add(A, B)
array([[20, 22, 24, 26],
       [28, 30, 32, 34],
       [36, 38, 40, 42],
       [44, 46, 48, 50],
       [52, 54, 56, 58]])

Subtraction

  • np.subtract() - performs element-wise subtraction between two matrices.
  • Syntax: np.subtract(matrix_1, matrix_2) sub

Transpose

  • np.transpose() - Permute the dimensions of an array.
  • Transposing an \(M \times N\) matrix flips it around the center diagonal and results in an \(N \times M\) matrix.
  • Syntax: np.transpose(matrix) transpose

EXAMPLE

A = np.arange(0, 20).reshape(5,4)
print(A)
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]]
# transpose 
np.transpose(A)
array([[ 0,  4,  8, 12, 16],
       [ 1,  5,  9, 13, 17],
       [ 2,  6, 10, 14, 18],
       [ 3,  7, 11, 15, 19]])

Multiplication

  • np.dot() - performs matrix multiplication between two matrices.
  • Syntax: np.dot(matrix_1, matrix_2) mul

# multiplication
np.dot(A,B) 
ValueError: shapes (5,4) and (5,4) not aligned: 4 (dim 1) != 5 (dim 0)

Note - For matrix multiplication the number of columns in matrix \(A\) should be equal to the number of rows in matrix \(B\) - Here, Order of matrix \(A\) = \(5 \times 4\) and order of matrix \(B\) = \(5 \times 4\) - So, \(5 \neq 4\) - That’s why it shows ValueError: shapes (5,4) and (5,4) not aligned: 4 (dim 1) != 5 (dim 0)

# transpose matrix B to make it 4x5 in dimension
T = np.transpose(B)
print(T)
[[20 24 28 32 36]
 [21 25 29 33 37]
 [22 26 30 34 38]
 [23 27 31 35 39]]

DOT OPERATION

# now we can perform multiplication
np.dot(A,T)
array([[ 134,  158,  182,  206,  230],
       [ 478,  566,  654,  742,  830],
       [ 822,  974, 1126, 1278, 1430],
       [1166, 1382, 1598, 1814, 2030],
       [1510, 1790, 2070, 2350, 2630]])
# using matmul 
np.matmul(A, T)
array([[ 134,  158,  182,  206,  230],
       [ 478,  566,  654,  742,  830],
       [ 822,  974, 1126, 1278, 1430],
       [1166, 1382, 1598, 1814, 2030],
       [1510, 1790, 2070, 2350, 2630]])
# using @ operator 
A @ T 
array([[ 134,  158,  182,  206,  230],
       [ 478,  566,  654,  742,  830],
       [ 822,  974, 1126, 1278, 1430],
       [1166, 1382, 1598, 1814, 2030],
       [1510, 1790, 2070, 2350, 2630]])

Element-wise multiplication

  • np.multiply() - performs element-wise multiplication between two matrices.
  • Syntax: np.multiply(matrix1, matrix2)
# element-wise multiplication 
np.multiply(A, B)
array([[  0,  21,  44,  69],
       [ 96, 125, 156, 189],
       [224, 261, 300, 341],
       [384, 429, 476, 525],
       [576, 629, 684, 741]])

Division

  • np.divide() - performs element-wise division between two matrices.
  • Syntax: np.divide(matrix_1, matrix_2)
# division
np.divide(A, B)
array([[0.        , 0.04761905, 0.09090909, 0.13043478],
       [0.16666667, 0.2       , 0.23076923, 0.25925926],
       [0.28571429, 0.31034483, 0.33333333, 0.35483871],
       [0.375     , 0.39393939, 0.41176471, 0.42857143],
       [0.44444444, 0.45945946, 0.47368421, 0.48717949]])

Statistics

import numpy as np 
# 1D array 
A = np.arange(20)
print(A)
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
A.ndim
1
# 2D array 
A2 = np.array([[11, 12, 13], [21, 22, 23]])
print(A2)
[[11 12 13]
 [21 22 23]]
A2.ndim
2

Sum

  • Sum of array elements over a given axis.
    • Syntax: np.sum(array); array-wise sum
    • Syntax: np.sum(array, axis=0); row-wise sum
    • Syntax: np.sum(array, axis=1); column-wise sum

Axis 0 is thus the first dimension (the “rows”), and axis 1 is the second dimension (the “columns”)

Examples

# sum of 1D array 
np.sum(A1)
45
# array-wise sum of 2D array 
np.sum(A2)
102
# sum of 2D array(axis=0, row-wise sum)
np.sum(A2, axis=0)
array([32, 34, 36])
# sum of 2D array(axis=1, column-wise sum)
np.sum(A2, axis=1)
array([36, 66])

Mean

  • Compute the median along the specified axis.

  • Returns the average of the array elements. The average is taken over the flattened array by default, otherwise over the specified axis. float64 intermediate and return values re used for integer inputs.

    • Syntax: np.mean(array); array-wise mean
    • Syntax: np.mean(array, axis=0); row-wise mean
    • Syntax: np.mean(array, axis=1); column-wise mean
# compute the average of array `A`
np.mean(A)
9.5
# mean of 2D array(axis=0, row-wise)
np.mean(A2, axis=0)
array([16., 17., 18.])
# mean of 2D array(axis=1, column-wise)
np.mean(A2, axis=1)
array([12., 22.])

Median

  • Compute the median along the specified axis.

  • Returns the median of the array elements.

    • Syntax: np.median(array); array-wise median
    • Syntax: np.median(array, axis=0); row-wise median
    • Syntax: np.median(array, axis=1); column-wise median
# compute the meadian of `A`
np.median(A)
9.5
# median of 2D array(axis=0, row-wise)
np.median(A2, axis=0)
array([16., 17., 18.])
# median of 2D array(axis=1, column-wise)
np.median(A2, axis=1)
array([12., 22.])

Minimum

  • Return the minimum of an array or minimum along an axis.

    • Syntax: np.min(array); array-wise min
    • Syntax: np.min(array, axis=0); row-wise min
    • Syntax: np.min(array, axis=1); column-wise min
# minimum value of `A`
np.min(A)
0
# minimum value of A2(axis=0, row-wise)
np.min(A2, axis=0)
array([11, 12, 13])
# minimum value of A2(axis=1, column-wise)
np.min(A2, axis=1)
array([11, 21])

Maximum

  • Return the maximum of an array or minimum along an axis.

    • Syntax: np.max(array); array-wise max
    • Syntax: np.max(array, axis=0); row-wise max
    • Syntax: np.max(array, axis=1); column-wise max
# maxiumum value of `A`
np.max(A)
19
# maxiumum value of A2(axis=0, row-wise)
np.max(A2, axis=0)
array([21, 22, 23])
# maxiumum value of A2(axis=1, column-wise)
np.max(A2, axis=1)
array([13, 23])

Range

  • Syntax: np.max(array) - np.min(array)
r = np.max(A) - np.min(A)
print(r)
19

Standard Deviation

  • Compute the standard deviation along the specified axis.
  • Returns the standard deviation, a measure of the spread of a distribution, of the array elements. The standard deviation is computed for the flattened array by default, otherwise over the specified axis.
    • Syntax: np.std(array); array-wise std
    • Syntax: np.std(array, axis=0); row-wise std
    • Syntax: np.std(array, axis=1); column-wise std
# compute the standard deviation of `A`
np.std(A)
5.766281297335398
# standard deviation of 2D array(axis=0, row-wise)
np.std(A2, axis=0)
array([5., 5., 5.])
# standard deviation of 2D array(axis=1, column-wise)
np.std(A2, axis=1)
array([0.81649658, 0.81649658])

Variance

  • Compute the variance along the specified axis.
  • Returns the variance of the array elements, a measure of the spread of a distribution. The variance is computed for the flattened array by default, otherwise over the specified axis.
    • Syntax: np.var(array); array-wise var
    • Syntax: np.var(array, axis=0); row-wise var
    • Syntax: np.var(array, axis=1); column-wise var
# compute the variance of `A`
np.var(A)
33.25
# variance of 2D array(axis=0, row-wise)
np.std(A2, axis=0)
array([5., 5., 5.])
# variance of 2D array(axis=1, column-wise)
np.std(A2, axis=0)
array([5., 5., 5.])

Quantile

  • Compute the q-th quantile of the data along the specified axis.
    • Syntax: np.quantile(array); array-wise quantile
    • Syntax: np.quantile(array, axis=0); row-wise quantile
    • Syntax: np.quantile(array, axis=1); column-wise quantile
# 25th percentile of `A`
np.quantile(A, 0.25)
4.75
# 50th percentile of `A2`(axis=0)
np.quantile(A2, 0.5, axis=0)
array([16., 17., 18.])
# 75th percentile of `A2`(axis=1)
np.quantile(A2, 0.75, axis=1)
array([12.5, 22.5])

Correlation Coefficient

# compute Correlation Coefficient
np.corrcoef(A2)
array([[1., 1.],
       [1., 1.]])

Linear Algebra

What is Linear Algebra?

Linear algebra is the branch of mathematics concerning linear equations such as linear maps such as and their representations in vector spaces and through matrices. Linear algebra is central to almost all areas of mathematics See More

Applications of Linear Algebra in Data Science

  • Loss Functions
  • Regularization
  • Covariance Matrix
  • Support Vector Machine Classification
  • Principal Component Analysis (PCA)
  • Singular Value Decomposition
  • Word Embeddings
  • Latent Semantic Analysis (LSA)
  • Image Representation as Tensors
  • Convolution and Image Processing

SEE DETAILS: https://www.analyticsvidhya.com/blog/2019/07/10-applications-linear-algebra-data-science/

Linear Algebra Operations

Determinant of matrix

  • The determinant of a matrix is a special number that can be calculated from a square matrix \[\begin{equation} A = \begin{bmatrix} a & b \\ c & d \end{bmatrix} \\ det(A) = ad - bc \end{equation}\] Whre, A is a \(2 \times 2\) matrix.

  • np.linalg.det() - performs determinant of the matrix

  • Syntax: np.linalg.det(matrix)

Source: https://www.mathsisfun.com/algebra/matrix-determinant.html

MATRIX

# create matrix A
A = np.matrix("4, 5, 16, 7; 2,-3,2,3; 3,4,5,6; 4,7,8,9")
print(A)
[[ 4  5 16  7]
 [ 2 -3  2  3]
 [ 3  4  5  6]
 [ 4  7  8  9]]
# create matrix B
B = np.matrix("4,5,6,7;2,-3,3,3; 3,4,5,6; 4, 7,8,9")
print(B)
[[ 4  5  6  7]
 [ 2 -3  3  3]
 [ 3  4  5  6]
 [ 4  7  8  9]]

DETERMINANT

# determinant of A 
np.linalg.det(A) 
128.00000000000009
# determinant of B 
np.linalg.det(B)
12.000000000000005

Rank of matrix

  • np.linalg.matrix_rank() - returns rank of the matrix
  • Syntax: np.linalg.matrix_rank(matrix)
# rank of matrix A 
np.linalg.matrix_rank(A)
4
# rank of matrix B 
np.linalg.matrix_rank(B)
4

Inverse of a Matrix

  • np.linalg.inv() - returns the multiplicative inverse of a matrix.
  • Syntax: np.linalg.inv(matrix)
# inverse of matrix A 
np.linalg.inv(A)
matrix([[ 9.37500000e-02, -4.68750000e-01,  3.68750000e+00,
         -2.37500000e+00],
        [ 3.53252781e-17, -2.50000000e-01,  5.00000000e-01,
         -2.50000000e-01],
        [ 9.37500000e-02,  3.12500000e-02, -3.12500000e-01,
          1.25000000e-01],
        [-1.25000000e-01,  3.75000000e-01, -1.75000000e+00,
          1.25000000e+00]])
# inverse of matrix B 
np.linalg.inv(B)
matrix([[ 1.50000000e+00, -2.08166817e-17, -1.00000000e+00,
         -5.00000000e-01],
        [ 2.50000000e-01, -1.66666667e-01, -3.33333333e-01,
          8.33333333e-02],
        [ 1.00000000e+00,  3.33333333e-01, -3.33333333e+00,
          1.33333333e+00],
        [-1.75000000e+00, -1.66666667e-01,  3.66666667e+00,
         -9.16666667e-01]])

System of linear equations

Consider a system of equations \[3x + y + 2z = 2\] \[3x + 2y + 5z = -1\] \[6x + 7y + 8z = 3\]

Now we can write the equations in the form of \(Ax = b\)

  • np.linalg.matrix_rank() - returns rank of the matrix
  • Syntax: np.linalg.matrix_rank(matrix)